SMS Normalization: Combining Phonetics, Morphology and Semantics

نویسندگان

  • Jesus Oliva
  • Jose Ignacio Serrano
  • M. Dolores del Castillo
  • Ángel Iglesias
چکیده

The language used in electronic communications such as emails, chats and SMS texts presents special phenomena and important deviations from natural language. Typical machine translation approaches are difficult to adapt to SMS language due to the many irregularities this kind of language shows. This paper presents a new approach for SMS normalization that combines lexical and phonological translation techniques with disambiguation algorithms at two different levels: lexical and semantic. The results obtained by the system outperform some of the existing methods of SMS normalization despite the fact that the corpus created has some features that complicates the normalization task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learnability in a Minimalist Framework: Root Compounds, Merger, and the Syntax-Morphology Interface

The Minimalist Program in syntactic theory seeks to factor cross-linguistic variation out of the syntax, and entirely into other grammatical components, chiefly the lexicon and the interfaces with semantics and phonetics. At the same time, current Minimalist research accords a central role to the operation of "Merge," a generalized transformation combining two autonomous subtrees as daughters o...

متن کامل

Introduction to English Language and Linguistics – Reader

Chapter 2: Phonetics and Phonology ......................................................................................... 6 Chapter 3: Morphology ............................................................................................................ 18 Chapter 4: Syntax ..................................................................................................................... ...

متن کامل

Grammar at the Borderline: A Case Study of P as a Lexical Category

What is an interface? Talk of interfaces arises when one is confronted with some boundary phenomenon that sits at a point of contact between two domains. In my area of study, namely syntax, people talk about the “syntax-phonology interface”, the “syntax-morphology interface”, or the “syntax-semantics interface”. So an interface can be understood as a borderline between two domains. This metapho...

متن کامل

Artistic Expression in the Making of Sa’di’s Sonnet

Is literature aesthetic in its form? If it is, what causes this beauty, creates this appeal and results in this artistic expression? Understanding the aesthetics of literature, poetry in particular, is made possible through the understanding of grammar (morphology, syntax, phonetics and semantics). Consequently, in the literary canon of the world, aesthetics are methodical, systematic and acces...

متن کامل

The Machine Translation Researches and Governmental View in Korea

Viewed from a broad perspective, in the seventies when we studied the basic technologies of NLP as a groundwork of MT, the focus of research was given to describing various phenomena of the Korean language in a linguistically significant way and processing the Korean characters mathematically or specific phenomena of the language logically with a computer. The theoretical linguistic description...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011